OcrV1, Main, Exploration, bibRecord, 002101

SPEEDING UP CHINESE CHARACTER RECOGNITION IN AN AUTOMATIC DOCUMENT READING SYSTEM

Identifieur interne : 002101 ( Main/Exploration ); précédent : 002100; suivant : 002102

SPEEDING UP CHINESE CHARACTER RECOGNITION IN AN AUTOMATIC DOCUMENT READING SYSTEM

Auteurs : Yi-Hong Tseng [République populaire de Chine] ; Chi-Chang Kuo [République populaire de Chine] ; Hsi-Jian Lee [République populaire de Chine, Taïwan]

Source :

Pattern Recognition [ 0031-3203 ] ; 1997.

RBID : ISTEX:9126B30AAE6172BC820ED69CF593CC0CD4A41104

Abstract

In this paper, We present two techniques for speeding up character recognition. Our character recognition system, including the candidate-cluster selection and modified branch-and-bound detail-matching modules, is implemented using two statistical features: crossing-counts and contour-direction counts. In the training stage, we divide characters into different clusters by using reference characters. To have a very high recognition rate, the candidate-cluster selection module selects the top 60 clusters with minimal distances from among 300 predefined clusters. To further speed-up the recognition speed, we use a modified branch-and-bound algorithm in the detail-matching module. In the automatic document reading system, characters and punctuation marks are first extracted from printed document images and sorted according to their positions and the document orientation. The system then recognizes all printed Chinese characters between pairs of punctuation marks. The results are then spoken aloud by a speech-synthesis system. The character recognition system and the text-to-speech synthesis system are integrated in the Windows-based document reading system, which provides a user-friendly environment.

Url:

https://api.istex.fr/document/9126B30AAE6172BC820ED69CF593CC0CD4A41104/fulltext/pdf

DOI: 10.1016/S0031-3203(98)00043-0

Affiliations:

République populaire de Chine, Taïwan

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000192
to stream Istex, to step Curation: 000189
to stream Istex, to step Checkpoint: 001605
to stream Main, to step Merge: 002218
to stream Main, to step Curation: 002101

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>SPEEDING UP CHINESE CHARACTER RECOGNITION IN AN AUTOMATIC DOCUMENT READING SYSTEM</title>
<author><name sortKey="Tseng, Yi Hong" sort="Tseng, Yi Hong" uniqKey="Tseng Y" first="Yi-Hong" last="Tseng">Yi-Hong Tseng</name>
</author>
<author><name sortKey="Kuo, Chi Chang" sort="Kuo, Chi Chang" uniqKey="Kuo C" first="Chi-Chang" last="Kuo">Chi-Chang Kuo</name>
</author>
<author><name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:9126B30AAE6172BC820ED69CF593CC0CD4A41104</idno>
<date when="1998" year="1998">1998</date>
<idno type="doi">10.1016/S0031-3203(98)00043-0</idno>
<idno type="url">https://api.istex.fr/document/9126B30AAE6172BC820ED69CF593CC0CD4A41104/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000192</idno>
<idno type="wicri:Area/Istex/Curation">000189</idno>
<idno type="wicri:Area/Istex/Checkpoint">001605</idno>
<idno type="wicri:doubleKey">0031-3203:1998:Tseng Y:speeding:up:chinese</idno>
<idno type="wicri:Area/Main/Merge">002218</idno>
<idno type="wicri:Area/Main/Curation">002101</idno>
<idno type="wicri:Area/Main/Exploration">002101</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a">SPEEDING UP CHINESE CHARACTER RECOGNITION IN AN AUTOMATIC DOCUMENT READING SYSTEM</title>
<author><name sortKey="Tseng, Yi Hong" sort="Tseng, Yi Hong" uniqKey="Tseng Y" first="Yi-Hong" last="Tseng">Yi-Hong Tseng</name>
<affiliation wicri:level="1"><country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan 30050</wicri:regionArea>
<wicri:noRegion>Taiwan 30050</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Kuo, Chi Chang" sort="Kuo, Chi Chang" uniqKey="Kuo C" first="Chi-Chang" last="Kuo">Chi-Chang Kuo</name>
<affiliation wicri:level="1"><country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan 30050</wicri:regionArea>
<wicri:noRegion>Taiwan 30050</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
<affiliation wicri:level="1"><country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan 30050</wicri:regionArea>
<wicri:noRegion>Taiwan 30050</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Taïwan</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Pattern Recognition</title>
<title level="j" type="abbrev">PR</title>
<idno type="ISSN">0031-3203</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="1997">1997</date>
<biblScope unit="volume">31</biblScope>
<biblScope unit="issue">11</biblScope>
<biblScope unit="page" from="1601">1601</biblScope>
<biblScope unit="page" to="1612">1612</biblScope>
</imprint>
<idno type="ISSN">0031-3203</idno>
</series>
<idno type="istex">9126B30AAE6172BC820ED69CF593CC0CD4A41104</idno>
<idno type="DOI">10.1016/S0031-3203(98)00043-0</idno>
<idno type="PII">S0031-3203(98)00043-0</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this paper, We present two techniques for speeding up character recognition. Our character recognition system, including the candidate-cluster selection and modified branch-and-bound detail-matching modules, is implemented using two statistical features: crossing-counts and contour-direction counts. In the training stage, we divide characters into different clusters by using reference characters. To have a very high recognition rate, the candidate-cluster selection module selects the top 60 clusters with minimal distances from among 300 predefined clusters. To further speed-up the recognition speed, we use a modified branch-and-bound algorithm in the detail-matching module. In the automatic document reading system, characters and punctuation marks are first extracted from printed document images and sorted according to their positions and the document orientation. The system then recognizes all printed Chinese characters between pairs of punctuation marks. The results are then spoken aloud by a speech-synthesis system. The character recognition system and the text-to-speech synthesis system are integrated in the Windows-based document reading system, which provides a user-friendly environment.</div>
</front>
</TEI>
<affiliations><list><country><li>République populaire de Chine</li>
<li>Taïwan</li>
</country>
</list>
<tree><country name="République populaire de Chine"><noRegion><name sortKey="Tseng, Yi Hong" sort="Tseng, Yi Hong" uniqKey="Tseng Y" first="Yi-Hong" last="Tseng">Yi-Hong Tseng</name>
</noRegion>
<name sortKey="Kuo, Chi Chang" sort="Kuo, Chi Chang" uniqKey="Kuo C" first="Chi-Chang" last="Kuo">Chi-Chang Kuo</name>
<name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
</country>
<country name="Taïwan"><noRegion><name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002101 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002101 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:9126B30AAE6172BC820ED69CF593CC0CD4A41104
   |texte=   SPEEDING UP CHINESE CHARACTER RECOGNITION IN AN AUTOMATIC DOCUMENT READING SYSTEM
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

SPEEDING UP CHINESE CHARACTER RECOGNITION IN AN AUTOMATIC DOCUMENT READING SYSTEM

SPEEDING UP CHINESE CHARACTER RECOGNITION IN AN AUTOMATIC DOCUMENT READING SYSTEM

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri